An active crawler for discovering geospatial Web services and their distribution pattern - A case study of OGC Web Map Service

نویسندگان

  • Wenwen Li
  • Chaowei Phil Yang
  • Chongjun Yang
چکیده

This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, redistribution , reselling , loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. The increased popularity of standards for geospatial interoperability has led to an increasing number of geospatial Web services (GWSs), such as Web Map Services (WMSs), becoming publicly available on the Internet. However, finding the services in a quick and precise fashion is still a challenge. Traditional methods collect the services through centralized registries, where services can be manually registered. But the meta-data of the registered services cannot be updated timely. This paper addresses the above challenges by developing an effective crawler to discover and update the services in (1) proposing an accumulated term frequency (ATF)–based conditional probability model for prioritized crawling, (2) utilizing concurrent multi-threading technique, and (3) adopting an automatic mechanism to update the metadata of identified services. Experiments show that the proposed crawler achieves good performance in both crawling efficiency and results' coverage/liveliness. In addition, an interesting finding regarding the distribution pattern of WMSs is discussed. We expect this research to contribute to automatic GWS discovery over the large-scale and dynamic World Wide Web and the promotion of operational interoperable distributed geospatial services. 1. Introduction The development of geospatial information acquisition methods helps to collect huge amounts of geospatial information. In 2006 alone, NASA's Earth Observing System Data and Information System (EOSDIS) produced over 3 terabytes (TB) of Earth system science data on a daily basis (NASA 2007). The geospatial information is widely utilized in different applications, such as navigation (Rae-Dupree 2006), transportation (Peytchev and Claramunt 2001), urban planning (Stevens et al. 2007), and emergency response (Rauschert et al. 2002). However, they are archived in various forms, and the geospatial applications, provided by different vendors, are highly heterogeneous in data representation, storage, and access (Paul and Ghosh 2006). The heterogeneity makes it difficult …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources

With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resource...

متن کامل

Automatic Generation of Geospatial Metadata for Web Resources

Web resources that are not part of any Spatial Data Infrastructure can be an important source of information. However, the incorporation of Web resources within a Spatial Data Infrastructure requires a significant effort to create metadata. This work presents an extensible architecture for an automatic characterisation of Web resources and a strategy for assignation of their geographic scope. T...

متن کامل

Publication and Discovery of Semantically Annotated Geospatial Web Services

Environmental information and services have become a crucial asset in the creation of decission support systems. Unfortunately, this information and services are not usually exposed in an interoperable and standard way, limiting their reusability and impact in the community. Publishing and discovering geospatial information and services on the Web is therefore an important challenge in order to...

متن کامل

Interpolation of Precipitation Sensor Measurements using OGC Web Services

The standards developed by the Open Geospatial Consortium (OGC) provide a broad foundation for web-based geographical applications. As one of these the OGC Sensor Web Enablement (SWE), which comprises standards for the consumerand provider-oriented sensor viewpoint, allows data requests of real-time sensor measurements and observations, abstracting from the inherent sensor particularities. The ...

متن کامل

Information Services for Grid/Web Service Oriented Architecture (SOA) Based Geospatial Applications

Geographical Information Systems (GIS) presents data-intensive environment for acquiring, processing and sharing geo-data among interested parties. In order to serve geographical information to users in such environment, Service Oriented Architecture (SOA) principles have gained great importance. In SOA-based systems, Information Services support the discovery and handling of these geospatial s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International Journal of Geographical Information Science

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2010